Overview

Dataset Statistics

Number of Variables 9
Number of Rows 20098
Missing Cells 7764
Missing Cells (%) 4.3%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 7.8 MB
Average Row Size in Memory 407.0 B
Variable Types
  • Numerical: 4
  • Categorical: 4
  • DateTime: 1

Dataset Insights

id is uniformly distributed Uniform
kilometers has 212 (1.05%) missing values Missing
extra_features has 7552 (37.58%) missing values Missing
id is skewed Skewed
model_year is skewed Skewed
kilometers is skewed Skewed
price is skewed Skewed
extra_features has a high cardinality: 7437 distinct values High Cardinality

Variables


id

numerical

Approximate Distinct Count 20098
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 321568
Mean 10048.5
Minimum 0
Maximum 20097
Zeros 1
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • id is uniformly distributed

Quantile Statistics

Minimum 0
5-th Percentile 1004.85
Q1 5024.25
Median 10048.5
Q3 15072.75
95-th Percentile 19092.15
Maximum 20097
Range 20097
IQR 10048.5

Descriptive Statistics

Mean 10048.5
Standard Deviation 5801.9372
Variance 3.3662e+07
Sum 2.0195e+08
Skewness 0
Kurtosis -1.2
Coefficient of Variation 0.5774
  • id is not normally distributed (p-value 1.0184356610017269e-18)

make

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1432982
  • The largest value (Nissan) is over 39.04 times larger than the second largest value (Nissan Motor Egypt)

Length

Mean 6.2997
Standard Deviation 1.8727
Median 6
Minimum 6
Maximum 18

Sample

1st row Nissan
2nd row Nissan
3rd row Nissan
4th row Nissan
5th row Nissan

Letter

Count 125608
Lowercase Letter 104506
Space Separator 1004
Uppercase Letter 21102
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Nissan, Nissan Motor Egypt) take over 50.0%
  • The largest value (nissan) is over 40.04 times larger than the second largest value (egypt)

model

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1413901
  • The largest value (Sunny) is over 4.71 times larger than the second largest value (Qashqai)

Length

Mean 5.3503
Standard Deviation 0.7834
Median 5
Minimum 4
Maximum 14

Sample

1st row Juke
2nd row Juke
3rd row Juke
4th row Juke
5th row Juke

Letter

Count 107529
Lowercase Letter 87429
Space Separator 2
Uppercase Letter 20100
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Sunny, Qashqai) take over 50.0%
  • The largest value (sunny) is over 4.71 times larger than the second largest value (qashqai)

model_year

numerical

Approximate Distinct Count 44
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 321568
Mean 2016.1595
Minimum 1918
Maximum 2024
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • model_year is skewed left (γ1 = -1.9191)

Quantile Statistics

Minimum 1918
5-th Percentile 2008
Q1 2014
Median 2017
Q3 2020
95-th Percentile 2022
Maximum 2024
Range 106
IQR 6

Descriptive Statistics

Mean 2016.1595
Standard Deviation 4.865
Variance 23.6685
Sum 4.0521e+07
Skewness -1.9191
Kurtosis 12.4276
Coefficient of Variation 0.002413
  • model_year is not normally distributed (p-value 1.6753274597846486e-09)
  • model_year has 538 outliers

kilometers

numerical

Approximate Distinct Count 657
Approximate Unique (%) 3.3%
Missing 212
Missing (%) 1.1%
Infinite 0
Infinite (%) 0.0%
Memory Size 318176
Mean 97220.4498
Minimum 0
Maximum 1.72e+07
Zeros 260
Zeros (%) 1.3%
Negatives 0
Negatives (%) 0.0%
  • kilometers is skewed right (γ1 = 87.8501)

Quantile Statistics

Minimum 0
5-th Percentile 9999
Q1 41000
Median 90000
Q3 139999
95-th Percentile 200000
Maximum 1.72e+07
Range 1.72e+07
IQR 98999

Descriptive Statistics

Mean 97220.4498
Standard Deviation 143404.7989
Variance 2.0565e+10
Sum 1.9333e+09
Skewness 87.8501
Kurtosis 10253.1172
Coefficient of Variation 1.475
  • kilometers is not normally distributed (p-value 4.227399821996957e-25)
  • kilometers has 108 outliers

transmission_type

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1483340
  • The largest value (Automatic) is over 14.41 times larger than the second largest value (Manual)

Length

Mean 8.8054
Standard Deviation 0.739
Median 9
Minimum 6
Maximum 9

Sample

1st row Automatic
2nd row Automatic
3rd row Automatic
4th row Automatic
5th row Automatic

Letter

Count 176970
Lowercase Letter 156872
Space Separator 0
Uppercase Letter 20098
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Automatic, Manual) take over 50.0%
  • The largest value (automatic) is over 14.41 times larger than the second largest value (manual)

extra_features

categorical

Approximate Distinct Count 7437
Approximate Unique (%) 59.3%
Missing 7552
Missing (%) 37.6%
Memory Size 3386539

Length

Mean 204.9298
Standard Deviation 100.5367
Median 187
Minimum 3
Maximum 587

Sample

1st row Air Conditioning; ...
2nd row ABS; Air Condition...
3rd row ABS; Air Condition...
4th row ABS; Air Condition...
5th row ABS; Air Condition...

Letter

Count 2038984
Lowercase Letter 1626994
Space Separator 342604
Uppercase Letter 411990
Dash Punctuation 8339
Decimal Number 0
  • The largest value (power) is over 2.1 times larger than the second largest value (air)

price

numerical

Approximate Distinct Count 892
Approximate Unique (%) 4.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 321568
Mean 274005.1249
Minimum 0
Maximum 25895000
Zeros 117
Zeros (%) 0.6%
Negatives 0
Negatives (%) 0.0%
  • price is skewed right (γ1 = 60.6145)

Quantile Statistics

Minimum 0
5-th Percentile 117000
Q1 176000
Median 245000
Q3 336000
95-th Percentile 518000
Maximum 25895000
Range 25895000
IQR 160000

Descriptive Statistics

Mean 274005.1249
Standard Deviation 242835.7443
Variance 5.8969e+10
Sum 5.507e+09
Skewness 60.6145
Kurtosis 6196.0758
Coefficient of Variation 0.8862
  • price is not normally distributed (p-value 4.836911480620255e-25)
  • price has 685 outliers

priced_at

datetime

Distinct Count 226.3906
Approximate Unique (%) 1.1%
Missing 0
Missing (%) 0.0%
Memory Size 160912
Minimum 2022-02-02 00:00:00
Maximum 2023-04-30 00:00:00

Interactions

Correlations

Missing Values